10 research outputs found

    Clustering of Vibrio parahaemolyticus Isolates Using MLST and Whole-Genome Phylogenetics and Protein Motif Fingerprinting

    Get PDF
    Vibrio parahaemolyticus is a ubiquitous and abundant member of native microbial assemblages in coastal waters and shellfish. Though V. parahaemolyticus is predominantly environmental, some strains have infected human hosts and caused outbreaks of seafood-related gastroenteritis. In order to understand differences among clinical and environmental V. parahaemolyticus strains, we used high quality DNA sequencing data to compare the genomes of V. parahaemolyticus isolates (n = 43) from a variety of geographic locations and clinical and environmental sample matrices. We used phylogenetic trees inferred from multilocus sequence typing (MLST) and whole-genome (WG) alignments, as well as a novel classification and genome clustering approach that relies on protein motif fingerprints (MFs), to assess relationships between V. parahaemolyticus strains and identify novel molecular targets associated with virulence. Differences in strain clustering at more than one position were observed between the MLST and WG phylogenetic trees. The WG phylogeny had higher support values and strain resolution since isolates of the same sequence type could be differentiated. The MF analysis revealed groups of protein motifs that were associated with the pathogenic MLST type ST36 and a large group of clinical strains isolated from human stool. A subset of the stool and ST36-associated protein motifs were selected for further analysis and the motif sequences were found in genes with a variety of functions, including transposases, secretion system components and effectors, and hypothetical proteins. DNA sequences associated with these protein motifs are candidate targets for future molecular assays in order to improve surveys of pathogenic V. parahaemolyticus in the environment and seafood

    Transcriptome sequencing and development of an expression microarray platform for the domestic ferret

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The ferret (<it>Mustela putorius furo</it>) represents an attractive animal model for the study of respiratory diseases, including influenza. Despite its importance for biomedical research, the number of reagents for molecular and immunological analysis is restricted. We present here a parallel sequencing effort to produce an extensive EST (expressed sequence tags) dataset derived from a normalized ferret cDNA library made from mRNA from ferret blood, liver, lung, spleen and brain.</p> <p>Results</p> <p>We produced more than 500000 sequence reads that were assembled into 16000 partial ferret genes. These genes were combined with the available ferret sequences in the GenBank to develop a ferret specific microarray platform. Using this array, we detected tissue specific expression patterns which were confirmed by quantitative real time PCR assays. We also present a set of 41 ferret genes with even transcription profiles across the tested tissues, indicating their usefulness as housekeeping genes.</p> <p>Conclusion</p> <p>The tools developed in this study allow for functional genomic analysis and make further development of reagents for the ferret model possible.</p

    P-tree Classification of Yeast Gene Deletion Data

    No full text
    Genomics data has many properties that make it different from &quot;typical&quot; relational data. The presence of multi-valued attributes as well as the large number of null values led us to a P-tree-based bit-vector representation in which matching 1-values were counted to evaluate similarity between genes. Quantitative information such as the number of interactions was also included in the classifier. Interaction information allowed us to extend the known properties of one protein with information on its interacting neighbors. Different feature attributes were weighted independently. Relevance of different attributes was systematically evaluated through optimization of weights using a genetic algorithm. The AROC value for the classified list was used as the fitness function for the genetic algorithm
    corecore